The Perception and Production of Phones and Tones: The Role of Rigid and Non-Rigid Face and Head Motion

نویسندگان

  • Denis Burnham
  • Jessica Reynolds
  • Eric Vatikiotis-Bateson
  • Hani Yehia
  • Valter Ciocca
  • Rua Haszard Morris
  • Harold Hill
  • Guillaume Vignali
  • Sandra Bollwerk
  • Helen Tam
  • Caroline Jones
چکیده

There is evidence, mostly with phones (consonants & vowels), that visual concomitants of articulation facilitate speech perception. Here the visual concomitants of lexical tone are considered. In tone languages fundamental frequency variations signal lexical meaning. In a word identification experiment with auditory-visual words differing only in tone, Cantonese perceivers performed above chance in a Visual Only condition. A subsequent study showed augmentation of word pair discrimination in noise in an Auditory-Visual compared to an Auditory Only condition for Cantonese, tonal Thai speakers, and even non-tone Australian speakers). The source of this perceptual information was sought in an OPTOTRAK production study of a Cantonese speaker. Functional Data Analysis (FDA) and Principal Component (PC) extraction suggests that the salient PCs to distinguish tones involve rigid motion of the head rather than non-rigid face motion. Results of a final perception study using OPTOTRAK output in which rigid or non-rigid motion could be presented independently in tone differing or phone differing conditions, suggests that non-rigid motion is most useful for the discrimination of phones, whereas rigid motion is most useful for the discrimination of tones. 1. Background Speech is auditory-visual: whenever visual (lip, face, and head and neck motion) information is available, humans use it to augment, and modify speech perception. Sumby and Pollack, (1954) showed augmentation: accuracy increases of 40-80% when speech presented in a noisy environment is accompanied by the speaker’s face; and McGurk and McDonald (1976) showed modification: the McGurk effect, in which auditory [ba] paired with visual [ga] is perceived as “da” or “tha”. So, both auditory and visual speech information is important in speech perception, and it is argued that this is because convergent information better specifies the speech source (VatikiotisBateson, Kuratate, Munhall, & Yehia, 2000). The McGurk Effect occurs in tone languages Cantonese (deGelder, Bertelson, Vroomen & Chen, 1995) and Thai (Burnham, 1992), but only insofar as it effects phones consonants and vowels. So, until recently all the research on auditoryvisual speech perception has concerned the perception of phones, with no consideration of visual information for tones. Tone is primarily based on F0, e.g., Cantonese has 6 tones, as in /fu55/ ‘husband’, /fu33/ ‘rich’, and /fu22/ ‘father’, /fu25/ ‘tiger’, /fu21/ ‘to hold’, and /fu23/ ‘woman’. In pitch-accented languages tone is carried between syllables, e.g., Japanese has 2 pitch-accents, high-low, e.g., ka⎡ki ‘oyster’, and low-high, e.g., ka⎤ki ‘persimmon’. Auditory-visual speech perception may operate differently in tone languages: Japanese listeners’ McGurk effect perception is less influenced by visual speech than is that of their American counterparts (Sekiyama, 1994), and the effect is further reduced in Chinese perceivers (Sekiyama, 1997). Sekiyama reasons that as there are 6 tones in Cantonese, 2 pitch-accents in Japanese, and none in English, these cross-language auditory-visual effects could result from the relative prevalence of tone, which presumably has few visual concomitants. There are visual correlates of F0 in speech production. Cavé, Guaïtella, Bertrand, et al. (1998) showed correlation between French speakers’ eyebrow motion and intonation in sentences, and there are strong correlations between head motion and F0 during speech (Yehia, Kuratate, & Vatikiotis-Bateson, 2002), which are continuous and seem to be used in auditory-visual perception (Vatiktiotis-Bateson et al., 2000). However, studies of the visual concomitants of tone are lacking. Auditory-visual perception and production of Cantonese tone is investigated here in two perception studies of Cantonese perceivers’ identification, and Cantonese, Thai, and Australian perceivers’ discrimination of tone in auditory-only(AO), visual-only(VO), and auditory-visual(AV) modes; and in a production study, measuring auditory and visual concomitants of phone and tone production, leading to the phone/non-rigid, tone/rigid hypothesis, which was tested in a final discrimination study with words presented with rigid, non-rigid or combined motion in AO, VO, or AV conditions. 2. AV Perception Tone: Preliminary Studies Brief versions of two experiments are given here. For further details see Burnham, Ciocca, and Stokes (2001), and Burnham, Lau, Tam, Schoknecht, C. (2001), respectively. 2.1 AV Tone Identification Cantonese Perceivers Method: A 2 x 2 x 2 x (3 x 6 x 4 x 2) design was employed. The first three factors were group manipulations: phonetic background participants with or without prior phonetic training; word presentation isolated words / words in sentences; and feedback, feedback for correct responses / no feedback. The remaining within-subjects factors were presentation mode – AO, VO, or AV; tone – the 6 Cantonese tones, high (5-5), low-mid/high-rising (2-5), mid (3-3), low-mid/low-falling (21), low-mid/mid-rising (2-3), and low-mid (2-2); phonemic strings 4 Cantonese phonemic strings on which tones were carried – 2 with monophthongal vowels, /fu/ and /fan/, and two with diphthongs, /soej/ and /hau/; and repetitions – each of the above 72 combinations were presented twice. Forty-eight adult native Cantonese speakers were tested, 24 trained phoneticians, and 24 nonphoneticians with appropriate group counterbalancing. Stimuli were the 24 Cantonese words (/fu/, /fan/, /soej/ and /hau/ x 6 tones) spoken by a 23-year-old native Cantonese female, and recorded on

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Rigid vs non-rigid face and head motion in phone and tone perception

There is recent evidence that the visual concomitants, not only of the articulation of phones (consonants & vowels), but also of tones (fundamental frequency variations that signal lexical meaning in tone languages) facilitate speech perception. Analysis of speech production data from a Cantonese speaker suggests that the source of this perceptual information for tones involve rigid motion of t...

متن کامل

Dynamical Behavior of a Rigid Body with One Fixed Point (Gyroscope). Basic Concepts and Results. Open Problems: a Review

The study of the dynamic behavior of a rigid body with one fixed point (gyroscope) has a long history. A number of famous mathematicians and mechanical engineers have devoted enormous time and effort to clarify the role of dynamic effects on its movement (behavior) – stable, periodic, quasi-periodic or chaotic. The main objectives of this review are: 1) to outline the characteristic features of...

متن کامل

بررسی استئوتومی پروگزیمال تیبیا به روش Lateral closing wedge (کاونتری) همراه با جابجایی قطعه دیستال به قدام و فیکاسیون rigid با T-plate و range of motion سریع

Background: The objective of this study was the evaluation of genovarum adjustment and the effect of distal part of osteotomy displacement and comparing it with other methods and determination of its complications. Material and method: A total of 25 knees from 22 patients in 1381-1383 in Baghiatallah Hospital were undergone proximal tibia osteotomy by lateral closing wedge (Coventry) method w...

متن کامل

Distribution of lateral active earth pressure on a rigid retaining wall under various motion modes

The design of retaining walls depends on the magnitude of active pressure exerted from the backfill. Therefore, estimating the scale of this pressure is a fundamental factor in the design. In this study, to assess the active earth pressure, a rigid retaining wall was built capable of translating and/or rotating with adjustable speed. Further, several physical tests were conducte...

متن کامل

Sound Wave Propagation in Viscous Liquid-Filled Non-Rigid Carbon Nanotube with Finite Length

   In this paper, numerical results obtained and explained from an exact formula in relation to sound pressure load due to the presence of liquid inside the finite-length non-rigid carbon nanotubes (CNTs), which is coupled with the dynamic equations of motion for the CNT. To demonstrate the accuracy of this work, the obtained formula has been compared to what has been used by other research...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006